The Parameterized Complexity of Enumerating Frequent Itemsets
نویسندگان
چکیده
A core problem in data mining is enumerating frequentlyoccurring itemsets in a given set of transactions. The search and enumeration versions of this problem have recently been proven NP and #P -hard, respectively (Gunopulos et al, 2003) and known algorithms all have running times whose exponential terms are functions of either the size of the largest transaction in the input and/or the largest itemset in the output. In this paper, we analyze the complexity of the size-k frequent itemset enumeration problem relative to a variety of parameterizations. Many of our hardness results are proved using a recent extension of parameterized complexity to solution-counting problems (McCartin, 2002). These results include hardness for versions of this problem based on restricted transaction-set structure. We also derive a collection of fixed-parameter algorithms using off-the-shelf parameterized algorithm design techniques, several of which suggest new algorithmic directions for the frequent itemset enumeration problem.
منابع مشابه
روشی کارا برای کاوش مجموعه اقلام پرتکرار در تحلیل دادههای سبد خرید
Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...
متن کاملLCM over ZBDDs: Fast Generation of Very Large-Scale Frequent Itemsets Using a Compact Graph-Based Representation
(Abstract) Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. In the last decade, a number of efficient algorithms for frequent itemset mining have been presented, but most of them focused on just enumerating the itemsets which satisfy the given conditions, and it was a different matter how to store and index the mining result for efficient dat...
متن کاملAn Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining
Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrec...
متن کاملA New Efficient Method for Mining Frequent Itemsets in Market Basket Data Analysis
Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...
متن کاملTransaction Databases, Frequent Itemsets, and Their Condensed Representations
Mining frequent itemsets is a fundamental task in data mining. Unfortunately the number of frequent itemsets describing the data is often too large to comprehend. This problem has been attacked by condensed representations of frequent itemsets that are subcollections of frequent itemsets containing only the frequent itemsets that cannot be deduced from other frequent itemsets in the subcollecti...
متن کامل